Method for evaluating uniformity of spots on an array

ABSTRACT

A method for evaluating a uniformity of spots on a DNA microarray having a plurality of spots, these spots undergoing specific emissions as a result of the hybridization of target DNA and tagged probe DNA, by examining whether patterns having periodicity are manifested in a sequence BG comprising background data obtained.

FIELD OF THE INVENTION

The present invention relates to a method for evaluating the uniformityof spots printed on an array, and is a technique supporting the analysisof an array having spots located two-dimensionally on a substrate, suchas a DNA microarray, a DNA chip, a protein array, etc.

BACKGROUND OF THE INVENTION

Research involving the analysis of gene information, as typified by theHuman Genome Project, is occurring at an ever faster pace worldwide,bringing with it an increasing need for new methodologies capable ofefficiently analyzing expressions at the in-vivo gene level.

A new method for measuring gene expression levels in cells is a DNAmicroarray, wherein several hundred to several tens of thousands ofsamples of DNA are aligned and fixed in spots in a matrix shape to aglass slide. mRNA (the target) that has been extracted and purified fromtarget cells is hybridized on the DNA microarray.

Fundamentally, the common method for performing measurements using DNAmicroarrays is two-color fluorescence labeling. In this method, mRNAoriginating from two types of cells (for example, normal cells andcancer cells) is extracted and purified, and the cells are labeled withfluorescent materials (CY3 and CY5) that have mutually differingexcitation wave lengths. Then, competitive hybridization is performed onthe same spots on the DNA microarray, and the fluorescent intensity ofeach spot on the array is measured by using two channels (CH1 and CH2)to view the mutually differing excitation wave lengths CY3 and CY5. Bythis means, the comparative quantity of gene level expression of the twotypes of cells is measured. In practice, after the fluorescent signalshave been measured, the measurement results are superposed and coloranalysis is performed.

The data from CY3 and CY5 is normally calculated based on the followingrelational expression, with essentially the value of log (R/G) beingutilized as the gene expression data.

(Sample A CY3 fluorescence data)

-   CH1–CH1B (Channel 1 background)=data R of CY3 (red)    (Sample B CY5 fluorescence data)-   CH2–CH2B (Channel 2 background)=data G of CY5 (green)

Here, CH1 (channel 1) and CH2 (channel 2) are the measured fluorescentintensity values of the spots (channel 1 and 2 existing so as to measurered and green separately), measured using a laser scanner. Further, CH1B(background data of channel 1) and CH2B (background data of channel 2)are background data of the spots measured using a laser scanner.

Gene spots with a greater degree of gene expression in Sample A show asred, spots with a greater degree of gene expression in Sample B show asgreen, and spots with an approximately equal degree of gene expressionshow as yellow. That is, the spots show the following colors, inaccordance with the ratio of R to G:

-   R/G>1 Red-   R/G=1 Yellow-   R/G<1 Green

As research based on DNA microarray data, such as the analysis ofperiodicity between genes, gene expression networks, and gene transfercontrol cascades, is being developed, and mathematical informationalmethods of this second generation research crucially require improvedaccuracy. A high degree of reliability is required with respect to thedata from DNA microarrays.

Further, highly accurate data is required in the case where cancer isdiagnosed on the basis of gene expression data.

However, gene analysis using DNA microarrays has only recently begun,and there are many cases where the issue of reproducibility needs to beresolved.

In particular, it is known that printing, hybridization, processing ofthe slide surface, etc. readily causes changes in the shape and size ofthe spots, and that mechanical influences during the spotting process,such as the minute displacement, vibration, etc. of the printing pins orplatform, readily cause changes in the position of the spots.

Although the uniformity of the spots on the DNA microarray is animportant factor that affects the accuracy of signal data, a method forevaluating this uniformity does not exist.

The present invention presents an extremely simple method for evaluatingthe uniformity of the spots on an array such as a DNA microarray, etc.

DISCLOSURE OF THE INVENTION

Portions of the gene expression data from DNA microarrays have been madepublic for researchers to use. This gene expression data have been madepublic by Stanford University, MIT, and Harvard University. StanfordUniversity have DNA microarray data base. In 1997, Professor Brown'sgroup at Stanford University succeeded in analyzing, for the first timein the world, the total gene expression (6400 genes) of the yeast cell(this is also public data).

Inventors, who were concerned that there was some regularity in the geneexpression quantities, obtained the total gene expression data of theyeast cell (TUP1) that was made public in the DNA microarray data baseof Stanford University, rearranged the expression data in the chromosomeorder and gene order, and analyzed the gene expression data thusobtained.

The result was that, on multiplying by 2, the presence of weakperiodicity was ascertained. However, these periods did not originatefrom the gene expression levels, but appeared due to the influence ofbackground data.

The present inventors have discovered a relationship between thepresence of periodicity in the background data (this background databeing used as fluorescence intensity compensatory data) and thenon-uniformity of spots, and have discovered that the presence ofperiodicity in the background data furthermore exerts an importantinfluence on the analysis of gene expression data. The present inventionhas resulted from the further discovery by the inventors that thisinformation can be applied not only to DNA microarrays, but also to allarrays having spots located two-dimensionally on a substrate, such asDNA chips, protein arrays, etc.

That is, the present invention is a method for evaluating a uniformityof spots on an array having a plurality of spots, these spots undergoingspecific emissions as a result of the hybridization of target matter andtagged probe matter, wherein the uniformity of the spots is evaluated byexamining whether patterns having periodicity are manifested in asequence BG comprising background data obtained in a manner describedbelow.

<Method for Preparing the Sequence BG>

-   (1) By applying an analysis software to images obtained by scanning    the monochromatic emission of the array, background data for each    spot are obtained.-   (2) Concerning each spot, the corresponding plate No.(α) of the    target matter and position (β, γ) are determined, wherein α is a    symbol or a numerical symbol identifying the plate while β and γ    respectively stand for the row and column of a matrix formed by    plate holes.-   (3) The No.(α) and the position (β, γ) on the plate are assigned to    respective background data.-   (4) The sequence BG consisting of the background data aligned in the    orders of the plate NO.(α) (the first priority) and the position (β,    γ) on the plate(the second priority) is obtained.

According to the present invention, the uniformity of spots on an arraysuch as a DNA microarray or the like can be evaluated in an extremelysimple manner.

According to the method of the present invention, the signal dataobtained from an array such as a DNA microarray, or the like, can beanalyzed with great accuracy by evaluating the uniformity of the spotson the array in advance.

BRIEF DESCRIPTION OF THE DRAWINGS

The patent or application file contains at least one drawing executed incolor. Copies of this patent or patent application publication withcolor drawings will be provided by the Office upon request and paymentof the necessary fee.

FIG. 1 shows a flow chart displaying the commonly used steps foranalyzing signal data using a DNA microarray.

FIG. 2 shows a commonly seen distribution of one spot within anallocated square.

FIG. 3 shows an example of sequence I_(j) (j=1, . . . , 100) displayedin a matrix group.

FIG. 4 shows an example of background data (CH2B) of yeast cell DNAmicroarray data displayed as a colored matrix, the relationship betweenCH2B and the colors being shown in the lower row.

FIG. 5 shows an example of background data (CH1B) of the yeast cell DNAmicroarray data displayed as a colored matrix, the relationship betweenCH1B and the colors being shown in the lower row.

FIG. 6 shows signal data (CH1D) of channel 1 in a DNA microarray imagewherein color is converted in accordance with the size of number values(Embodiment 1).

FIG. 7 shows signal data (CH2D) of channel 2 in a DNA microarray imagewherein color is converted in accordance with the size of number values(Embodiment 1).

FIG. 8 shows an example of background data (CH1B) of melanoma DNAdisplayed as a colored matrix, the relationship between CH1B and thecolors being shown in the lower row.

FIG. 9 shows an example of background data (CH2B) of the melanoma DNAdisplayed as a colored matrix, the relationship between CH2B and thecolors being shown in the lower row.

FIG. 10 shows signal data (CH1D) of channel 1 in a DNA microarray imagewherein color is converted in accordance with the size of number values(Embodiment 2).

FIG. 11 shows signal data (CH2D) of channel 2 in a DNA microarray imagewherein color is converted in accordance with the size of number values(Embodiment 2).

FIG. 12 shows an outline of the yeast cell DNA microarray data.

PREFERRED ASPECT TO EMBODY THE INVENTION

Below, a description is given using a DNA microarray as an example. Aflow chart displaying the commonly used steps for analyzing signal datausing a DNA microarray is as shown in FIG. 1.

<Background Data>

When scanning is performed on a slide on which hybridization has beencompleted, the color status of each spot on the slide is recorded aspicture image data. Then, the picture image data is processed usinganalysis software, and color data of the spots is acquired.Specifically, the image obtained by scanning is overlaid with a gridimage having a plurality of squares, one square being allocated for eachspot, and then the signal data and background data within each squareare acquired (FIG. 2).

Usually, the signal data is measured as the color intensity within anoval-shaped spot wherein a major axis and minor axis have beendesignated. The background data is measured as the intensity within thesquare surrounding the spot and in the area outside the boundaries ofthe spot.

The background data should be measured when all the spots are visibleand emit at a brightness whereby the emission intensity is notsaturated. There can be a slight variation in the background datadepending on the detecting conditions of the emission signal of thescanning device or the method used by the analysis software for samplingthe spots. However, the method of the present invention is not affectedby the scanning device or the type of analysis software.

Preferred scanning devices are GenePix 4000A, GeneTACLSIV, GTMASS,GMS418Array Scanner, AvalancheMicroscanner, ChipReader, GeneTAC2000,CRBIO, ScanArray3000, 4000, 5000, etc.

Preferred analysis software is ScanAlyze, ArrayAnalyzer, ImaGene,AutoGene, QuantArray, QuantarrayAutomation, MicroArraySuite,ArrayVision, ArrayGauge, GenePixPro, etc.

<Method for Preparing Sequence BG>

In the present invention, a sequence BG serving as an indicator forrevealing the characteristics of the background data is prepared asfollows.

-   (1) By applying an analysis software to images obtained by scanning    the monochromatic emission of the DNA microarray, background data    for each spot are obtained.-   (2) Concerning each spot, the corresponding plate No.(α) of the    target DNA and position (β, γ) are determined, wherein α is a symbol    or a numerical symbol identifying the plate while β and γ    respectively stand for the row and column of a matrix formed by    plate holes.-   (3) The No.(α) and the position (β, γ) on the plate are assigned to    respective background data.-   (4) The sequence BG consisting of the background data aligned in the    orders of the plate NO.(α) (the first priority) and the position (β,    γ) on the plate(the second priority) is obtained.    <Display Methods for Patterns in the Sequence BG>

There is no particular restriction on the display methods of the presentinvention as long as these can be used to determine the presence in thesequence BG of patterns having periodicity.

Preferred display methods are the methods below.

(Display Method 1)

A method whereby a sub-sequence formed from 1 or more elements isextracted from a sequence, each number contained in the sub-sequenceforming a color dot, wherein hue, luminosity, saturation, or acombination thereof, is defined by each type of number, the color dotfurther being sequentially output in a color dot matrix arranged in amatrix shape, a color pattern obtained from the output of the color dotmatrix causing the intrinsic regularity to be revealed.

(Display Method 2)

A method whereby a sequence is divided into a plurality ofsub-sequences, each number contained in the divided sub-sequencesforming a sub-color dot column wherein hue, luminosity, saturation, or acombination thereof, is defined by each type of number, the sub-colordot columns being arranged in an aligned manner to output a color dotmatrix wherein the color dots are arranged in a matrix shape, a colorpattern obtained from the output of the color dot matrix causing latentcharacteristics within the sequence to be revealed.

A more preferred method is display method 3 below.

(Display Method 3)

A method whereby, in display method 1 or display method 2, each numberforming a sequence I_(j) (j=1, . . . , m) is arranged according to thefollowing positioning pattern:

(j = 1, 2, 3, …  , k) (j = k + 1, k + 2, k + 3, …  , k + k)⋮(j = (n − 1)k + 1, (n − 1)k + 2, (n − 1)k + 3, …  , (n − 1)k + k)(j = nk + 1, nk + 2, nk + 3, …  , nk + k)(here, k is an integer of 2 or more, n is a natural number such thatnk+1≦m≦nk+k), a color dot matrix is output, latent characteristicswithin the sequence being revealed.

An even more preferred method is display method 4 below.

(Display Method 4)

A method whereby, when p is any given natural number less than m, and ris any given natural number, when the display method of display method 3is implemented while substituting k=p, p+r, p+2r, p+3r, . . . , a colordot matrix group is output wherein a color dot matrix of the p column, acolor dot matrix of the p+r column, and color dot matrices of the p+2r,p+3r . . . columns, as below, are all arranged in an aligned manner,latent characteristics within the sequence being revealed.

Display method 4 is particularly effective in the case where repeatedunits are totally unclear, or in the case where a portion of a sequencesimply does not exist in repeated regions.

FIG. 3 schematically shows a method for arranging each element insequence I_(j) (j=1 , . . . , 100) obtained by means of method 4 asmatrix groups consisting of the matrices k=1, k=2, k=3, k=4, . . . andk=20.

<Method for Evaluating Uniformity of Spots>

When the sequence BG has been displayed by means of a suitable displaymethod, it is verified whether patterns having periodicity are present.In the case where patterns having periodicity are present, it can bedetermined that the spots on the DNA microarray have low uniformity.

Various types of patterns having periodicity can be present in thesequence BG, such as a constantly repeated pattern or a plurality oftypes of patterns repeated across the entire sequence BG, patterns beingrepeated in portions of the sequence BG, etc.

If patterns having periodicity are not present in the sequence BG, itcan be determined that the spots were printed uniformly, andconsequently highly accurate data analysis is possible. On the otherhand, if patterns having periodicity are present in even a portion ofthe sequence BG, the printing conditions of the spots were not uniform.Consequently the signal data has low reliability, and is influenced bycompensatory factors, namely the patterns having periodicity of thebackground data. Since the periodic noise is included for signal data,the precise analysis is difficult.

Below, the method of the present invention is described more concretely.

<Embodiment 1>

(Yeast Cell DNA Microarray)

The yeast cell DNA microarray data from the DNA microarray datapublished by Stanford University was obtained (Genomic ExpressionPrograms in the Response of Yeast Cells to Environmental Changes, ArrayData File: y11n121 (variable heat 21C)).

An outline of the data that was obtained is shown in FIG. 12. In FIG.12, ‘CH1B’ is background data of channel 1, and ‘CH2B’ is backgrounddata of channel 2. ‘CH1D’ and ‘CH2D’ are signal data of compensatedspots obtained according to the following formula.CH 1 I−CH 1 B=CH 1 DCH 2 I−CH 2 B=CH 2 D

Further, in FIG. 12, ‘PLAT’ is a symbol or number identifying plates,‘PROW’ is a symbol showing the rows of each plate, and ‘PCOL’ is anumber showing the columns of each plate.

In FIG. 12, data concerning specific target DNA is displayed in rowunits. A plate number (‘PLAT’) and position on the plate (‘PROW’,‘PCOL’) is assigned for the background data ‘CH1B’ and ‘CH2B’ for eachitem of target DNA displayed in the ‘NAME’ column. Each item of data islisted first in the plate number unit, then in the plate row unit, andfinally in the plate column unit. Consequently, the order from top tobottom in FIG. 12 forms an order based on method 4 of the sequence BG ofthe present invention.

First, the background data (CH1B) is listed in sequence in the column‘CH1B’ of FIG. 12 from the top line to the bottom line to form thesequence BG, this being displayed according to display method 4.Consequently, the presence of 384 repeat structures in the sequence BGis shown. This result is shown in FIG. 5, wherein each number in thesequence BG is displayed in a matrix shape according to display method 3(here, k=384).

Similarly, FIG. 4 shows the result wherein each number in the sequenceBG of the background data (CH2B) is displayed in a matrix shapeaccording to display method 3.

It can be seen from these figures that, in both the sequences BG of thebackground data CH1B and CH2B, periodically fluctuating patterns havingdistinct periodicity are present as 384 repeat units.

FIG. 6 shows the signal data ‘CH1D’ of channel 1 in a DNA microarrayimage wherein color is converted in accordance with the size of numbervalues. Further, FIG. 7 shows a DNA microarray image of the signal data‘CH2D’ of channel 2 processed in the same way. These DNA microarrayimages are not necessarily identical to actual scanned images, butschematically show the emission intensity of the spots in scannedimages. The non-uniformity of the spots cannot be recognized at all fromFIGS. 6 and 7, but since the distinct repeatability in the backgrounddata can be recognized from FIGS. 4 and 5, the non-uniformity of thespots can be determined.

Since the background data are utilized as compensatory number values ofthe fluorescent intensity, if there is periodicity in the backgrounddata itself, periodicity originating from the background data willnecessarily be present in the gene expression data of the DNAmicroarray. Since this type of periodicity in the background dataaffects the reliability of the DNA microarray data, care is requiredduring data analysis.

<Embodiment 2>

(Melanoma DNA Microarray)

The melanoma DNA microarray data from the DNA microarray data publishedby Stanford University was obtained (NC160 Cancer Microarray Project).

As with embodiment 1, background data (CH1B) is prepared, and isdisplayed according to display method 4. Consequently, the presence of96 repeat structures in a sequence BG is shown (although it has alsobeen suggested that 24 repeat structures are intrinsic to the sequenceBG). FIG. 8 shows this result, wherein each number in the sequence BG isdisplayed in a matrix shape according to display method 3 (here, k=96).

Similarly, FIG. 9 shows this result, wherein each number in the sequenceBG of the background data (CH2B) is displayed in a matrix shapeaccording to display method 3.

In these figures, a plurality of vertical lines (that is, repeats of 96units) are displayed distinctly, and it can be understood that patternshaving periodicity are present in the sequences BG of the backgrounddata of CH1B and CH2B.

As with embodiment 1, the signal data ‘CH1D’ of channel 1 and the signaldata ‘CH2D’ of channel 2 are shown in DNA microarray images in FIGS. 10and 11. The non-uniformity of spots therein cannot be recognized at allfrom FIGS. 10 and 11, but since the distinct repeatability in thebackground data can be recognized from FIGS. 8 and 9, the non-uniformityof the spots can be determined.

According to the present invention, the uniformity of the spots on anarray such as a DNA microarray or the like can be evaluated in anextremely simple manner.

According to the method of the present invention, by evaluating inadvance the uniformity of the spots on the array, the signal dataobtained from an array such as a DNA microarray or the like can beanalyzed with great accuracy.

Specific examples of embodiments of the present invention are presentedabove, but these merely illustrate some possibilities of the inventionand do not restrict the claims thereof. The art set forth in the claimsincludes transformations and modifications to the specific examples setforth above.

Furthermore, the technical elements disclosed in the presentspecification or figures may be utilized separately or in all types ofconjunctions and are not limited to the conjunctions set forth in theclaims at the time of submission of the application. Furthermore, theart disclosed in the present specification or figures may be utilized tosimultaneously realize a plurality of aims or to realize one of theseaims.

1. A method for evaluating a uniformity of spots on an array having aplurality of spots, these spots undergoing specific emissions as aresult of the hybridization of target matter and tagged probe matter,wherein the uniformity of the spots is evaluated by examining whetherpatterns having periodicity are manifested in a sequence BG comprisingbackground data obtained, the method comprising: (1) obtainingbackground data for each spot by applying an analysis software to imagesobtained by scanning the monochromatic emission of the array; (2) foreach spot, the corresponding plate No.(α) of the target matter andposition (β, γ) are determined, wherein α is a symbol or a numericalsymbol identifying the plate while β and γ respectively stand for therow and column of a matrix formed by plate holes; (3) the No.(α) and theposition (β, γ) on the plate are assigned to respective background data;and (4) the sequence BG consisting of the background data aligned in theorders of the plate No.(α) (the first priority) and the position (β, γ)on the plate (the second priority) is obtained.
 2. A method as set forthin claim 1, the method being characterized in that the patterns in thesequence BG are displayed according to display method 1 or displaymethod 2 described below, whereby it is determined whether the patternshaving periodicity are manifested in the sequence BG, where displaymethod 1 is a method whereby a sub-sequence formed from one or moreelements is extracted from a sequence, the sub-sequence including aplurality of numbers, each number contained in the sub-sequence forminga color dot, wherein hue, luminosity, saturation, or a combinationthereof, is defined by each type of number, the color dot further beingsequentially output in a color dot matrix arranged in a matrix shape, acolor pattern obtained from the output of the color dot matrix causingthe intrinsic regularity to be revealed, and where display method 2 is amethod whereby a sequence is divided into a plurality of sub-sequences,each sub-sequence including a plurality of numbers, each numbercontained in the divided sub-sequences forming a sub-color dot columnwherein hue, luminosity, saturation, or a combination thereof, isdefined by each type of number, the sub-color dot columns being arrangedin an aligned manner to output a color dot matrix wherein the color dotsare arranged in a matrix shape, a color pattern obtained from the outputof the color dot matrix causing latent characteristics within thesequence to be revealed.
 3. A method as set forth in claim 2, the methodbeing characterized in that, in display method 1 or display method 2,each number forming a sequence I_(j) (j=1, . . . , m) is arrangedaccording to the following positioning pattern: (j = 1, 2, 3, …  , k)(j = k + 1, k + 2, k + 3, …  , k + k)            ⋮(j = (n − 1)k + 1, (n − 1)k + 2, (n − 1)k + 3, …  , (n − 1)k + k)(j = nk + 1, nk + 2, nk + 3, …  , nk + k) (where, k is an integer of 2or more, and n is a natural number such that nk+1≦m≦nk+k), a color dotmatrix is output, latent characteristics within the sequence beingrevealed.
 4. A method as set forth in claim 3, the method beingcharacterized in that when p is any given natural number less than m,and r is any given natural number, when the display method set forthabove is implemented while substituting k=p, p+r, p+2r, p+3r, . . . , acolor dot matrix group is output wherein color dot matrices of p, p+r,p+2r, p+3r . . . are all arranged in an aligned manner.
 5. A method forevaluating a uniformity of spots on a DNA microarray having a pluralityof spots, these spots undergoing specific emissions as a result of thehybridization of target DNA and tagged probe DNA, wherein the uniformityof the spots is evaluated by examining whether patterns havingperiodicity are manifested in a sequence BG comprising background dataobtained, the method comprising: (1) obtaining background data for eachspot by applying an analysis software to images obtained by scanning themonochromatic emission of the DNA microarray; (2) for each spot, thecorresponding plate No.(α) of the target DNA and position (β, γ) aredetermined, wherein a is a symbol or a numerical symbol identifying theplate while β and γ respectively stand for the row and column of amatrix formed by plate holes; (3) the No.(α) and the position (β, γ) onthe plate are assigned to respective background data; (4) the sequenceBG consisting of the background data aligned in the orders of the plateNo.(α) (the first priority) and the position (β, γ) on the plate (thesecond priority) is obtained.
 6. A method as set forth in claim 5, themethod being characterized in that the patterns in the sequence BG aredisplayed according to display method 1 or display method 2 describedbelow, whereby it is determined whether the patterns having periodicityare manifested in the sequence BG, where display method 1 is a methodwhereby a sub-sequence formed from one or more elements is extractedfrom a sequence, the sub-sequence including a plurality of numbers, eachnumber contained in the sub-sequence forming a color dot, wherein hue,luminosity, saturation, or a combination thereof, is defined by eachtype of number, the color dot further being sequentially output in acolor dot matrix arranged in a matrix shape, a color pattern obtainedfrom the output of the color dot matrix causing the intrinsic regularityto be revealed and where display method 2 is a method whereby a sequenceis divided into a plurality of sub-sequences, each of the sub-sequencesincluding a plurality of numbers, each number contained in the dividedsub-sequences forming a sub-color dot column wherein hue, luminosity,saturation, or a combination thereof, is defined by each type of number,the sub-color dot columns being arranged in an aligned manner to outputa color dot matrix wherein the color dots are arranged in a matrixshape, a color pattern obtained from the output of the color dot matrixcausing latent characteristics within the sequence to be revealed.